Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset
نویسندگان
چکیده
Classification algorithms are unable to make reliable models on the datasets with huge sizes. These datasets contain many irrelevant and redundant features that mislead the classifiers. Furthermore, many huge datasets have imbalanced class distribution which leads to bias over majority class in the classification process. In this paper combination of unsupervised dimensionality reduction methods with resampling is proposed and the results are tested on LungCancer dataset. In the first step PCA is applied on LungCancer dataset to compact the dataset and eliminate irrelevant features and in the second step SMOTE resampling is carried out to balance the class distribution and increase the variety of sample domain. Finally, Naïve Bayes classifier is applied on the resulting dataset and the results are compared and evaluation metrics are calculated. The experiments show the effectiveness of the proposed method across four evaluation metrics: Overall accuracy, False Positive Rate, Precision, Recall.
منابع مشابه
Predicting Survival of Patients with Lung Cancer Using Improved Adaptive Neuro-Fuzzy Inference System
Introduction: Lung cancer is the main cause of mortality in both genders worldwide. This disease is caused by the uncontrollable growth and development of cells in both or one of the lungs. Although the early diagnosis of this cancer is not an easy task, the earlier it is diagnosed, the higher will be the chance of treating. The objective of this study was to develop an optimized prediction mod...
متن کاملPredicting Survival of Patients with Lung Cancer Using Improved Adaptive Neuro-Fuzzy Inference System
Introduction: Lung cancer is the main cause of mortality in both genders worldwide. This disease is caused by the uncontrollable growth and development of cells in both or one of the lungs. Although the early diagnosis of this cancer is not an easy task, the earlier it is diagnosed, the higher will be the chance of treating. The objective of this study was to develop an optimized prediction mod...
متن کاملExtracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem
Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...
متن کاملADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION
With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...
متن کاملComputer-Aided Lung Nodule Recognition by SVM Classifier Based on Combination of Random Undersampling and SMOTE
In lung cancer computer-aided detection/diagnosis (CAD) systems, classification of regions of interest (ROI) is often used to detect/diagnose lung nodule accurately. However, problems of unbalanced datasets often have detrimental effects on the performance of classification. In this paper, both minority and majority classes are resampled to increase the generalization ability. We propose a nove...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1403.1949 شماره
صفحات -
تاریخ انتشار 2013